Methylation pattern reconstruction problem
methylFlow: http://github.com/hcorrada/methylFlow
Methylation pattern reconstruction problem
Methylation pattern reconstruction problem
Methylation pattern reconstruction problem
The statistic: number of reads in genomic region
The model: expected number of reads in genomic region
\[ \mathbb{E} y_v = \sum_{u:(v,u) |in E} \ell_{vu} \sum_{p:(v,u)\in p} \theta_p \]
The estimator
\[ \min_{\theta_p} \sum_v \lvert y_v - \sum_{u:(v,u)\in E} \ell_{vu} \sum_{p:(v,u)\in p} \theta_p \rvert + \lambda \sum_p \lvert \theta_p \rvert \]
How to solve efficiently
\[ \min_{\theta_p} \sum_v \lvert y_v - \sum_{u:(v,u)\in E} \ell_{vu} \sum_{p:(v,u)\in p} \theta_p \rvert + \lambda \sum_p \lvert \theta_p \rvert \]
If we interpret abundance as path flow, then we can rewrite in terms of edge flows
\[ f_{vu} = \sum_{p:(v,u) \in p} \theta_p \]
How to solve efficiently
\[ \min_{\theta_p} \sum_v \lvert y_v - \sum_{u:(v,u)\in E} \ell_{vu} \sum_{p:(v,u)\in p} \theta_p \rvert + \lambda \sum_p \lvert \theta_p \rvert \]
If we interpret abundance as path flow, then we can rewrite in terms of edge flows
\[ f_{vu} = \sum_{p:(v,u) \in p} \theta_p \]
\[ \min_{f \geq 0} \sum_v \lvert y_v - \sum_{u:(v,u)\in E} \ell_{vu} f_{vu} \rvert + \lambda f_{vt} \\ \textrm{s.t} \sum_{u:(v,u) \in E} f_{vu} = \sum_{w:(w,v) \in E} f_{wv} \]
How to solve efficiently
\[ \min_{f \geq 0} \sum_v \lvert y_v - \sum_{u:(v,u)\in E} \ell_{vu} f_{vu} \rvert + \lambda f_{vt} \\ \textrm{s.t} \sum_{u:(v,u) \in E} f_{vu} = \sum_{w:(w,v) \in E} f_{wv} \]
Pattern reconstruction from whole genome bisulfite sequencing
Dataset of 50bp reads from mouse wild-type activated B cells, two types of progenitor cells (CLP and KSL).
Reconstruct patterns 4-100x basepair length
Pattern reconstruction from whole genome bisulfite sequencing
Reconstruct patterns with accurate marginal estimates
Pattern reconstruction from targeted bisulfite sequencing
Compare patterns across samples and populations